Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

نویسندگان

  • Florian Leese
  • Philipp Brand
  • Andrey Rozenberg
  • Christoph Mayer
  • Shobhit Agrawal
  • Johannes Dambach
  • Lars Dietz
  • Jana S. Doemel
  • William P. Goodall-Copstake
  • Christoph Held
  • Jennifer A. Jackson
  • Kathrin P. Lampert
  • Katrin Linse
  • Jan N. Macher
  • Jennifer Nolzen
  • Michael J. Raupach
  • Nicole T. Rivera
  • Christoph D. Schubart
  • Sebastian Striewski
  • Ralph Tollrian
  • Chester J. Sands
چکیده

High throughput sequencing technologies are revolutionizing genetic research. With this "rise of the machines", genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02-25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene Family: Structure, Organization and Evolution

  Gene families are considered as groups of homologous genes which they share very similar sequences and they may have identical functions. Members of gene families may be found in tandem repeats or interspersed through the genome. These sequences are copies of the ancestral genes which have underwent changes. The multiple copies of each gene in a family were constructed based on gene duplicati...

متن کامل

Quantitation of genome damage and transcriptional profile of DNA damage response genes in human peripheral blood mononuclear cells exposed in vitro to low doses of neutron radiation

Background: Humans are exposed to ionizing radiation from different sources that include natural, occupational, medical, accidental exposures. Evaluation of the effect of low level of neutron exposure to human cells in vitro has important implications to human health. Attempts were made to measure genome damage, transcriptional profile of DNA damage response and repair genes in peripheral blood...

متن کامل

Strategies and Clinical Applications of Next Generation Sequencing

Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput se­quencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...

متن کامل

Detection and phylogenetic assessment of conserved synteny derived from whole genome duplications.

Identification of intragenomic conservation of gene compositions in multiple chromosomal segments led to evidence of whole genome (WGDs) duplications. The process by which WGDs have been maintained and decayed provides us with clues for understanding how the genome evolves. In this chapter, we summarize current understanding of phylogenetic distribution and evolutionary impact of WGDs, introduc...

متن کامل

Apoptosis: Opening PANdora's BoX

Extracellular nucleotides have been reported to act as a 'find-me' signal in the context of phagocyte recruitment by apoptotically dying cells. A new study now examines the mechanisms of nucleotide release during apoptosis and describes the hemichannel-forming protein pannexin 1 as a crucial player in this scenario.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012